Fix `linalg.matmul_transpose_b` for big tiles #410

dchigarev · 2024-11-15T15:00:49Z

It was discovered that linalg.matmul_transpose_b is lowered to xegpu incorrectly in the case of large tiles. The issue is caused by a std::swap placed inside a nested for-loop, which, instead of swapping rowOffs and colOffs only once, performs the swap in every iteration, resulting in incorrect offsets.

Signed-off-by: dchigarev <[email protected]>

dchigarev · 2024-11-15T15:02:05Z

lib/gc/Transforms/GPU/LinalgToXeGPU.cpp

-      if (transpose) {
-        std::swap(newRowOffs, newColOffs);
-      }


placing std::swap inside a nested for-loop was a bad idea since it swaps the values each iteration producing non-sense offsets at the end

Signed-off-by: dchigarev <[email protected]>

dchigarev · 2024-11-15T15:13:35Z

test/mlir/test/gc/Transforms/GPU/linalg-to-xegpu-dpas-transpose.mlir

-      %subview_1 = memref.subview %arg0[%arg3, 0] [32, 1024] [1, 1] : memref<1024x1024xf16> to memref<32x1024xf16, strided<[1024, 1], offset: ?>>
-      %subview_2 = memref.subview %arg1[%arg4, 0] [32, 1024] [1, 1] : memref<1024x1024xf16> to memref<32x1024xf16, strided<[1024, 1], offset: ?>>
-      linalg.matmul_transpose_b ins(%subview_1, %subview_2 : memref<32x1024xf16, strided<[1024, 1], offset: ?>>, memref<32x1024xf16, strided<[1024, 1], offset: ?>>) outs(%subview_0 : memref<32x32xf16, strided<[1024, 1], offset: ?>>)
+    scf.parallel (%arg3, %arg4) = (%c0, %c0) to (%c1024, %c1024) step (%c16, %c64) {


increased the tile size for Y axis to test the problematic case

dchigarev · 2024-11-15T15:14:33Z

test/mlir/test/gc/Transforms/GPU/linalg-to-xegpu-dpas-transpose.mlir

 // CHECK: %[[tB:.+]] = xegpu.update_nd_offset %[[rootB]], [%c0, %c0]
 // CHECK: %[[tB1:.+]] = xegpu.update_nd_offset %[[rootB]], [%c16, %c0]
+// CHECK: %[[tB2:.+]] = xegpu.update_nd_offset %[[rootB]], [%c32, %c0]
+// CHECK: %[[tB3:.+]] = xegpu.update_nd_offset %[[rootB]], [%c48, %c0]


it used to do something like:

xegpu.update_nd_offset %[[rootB]], [%c0, %c0] xegpu.update_nd_offset %[[rootB]], [%c16, %c0] xegpu.update_nd_offset %[[rootB]], [%c32, %c16] xegpu.update_nd_offset %[[rootB]], [%c16, %c32]

Fix matmul.transpose_b for big tiles

54f86c2

Signed-off-by: dchigarev <[email protected]>

dchigarev commented Nov 15, 2024

View reviewed changes

clang-format

f83a0cd

Signed-off-by: dchigarev <[email protected]>

dchigarev commented Nov 15, 2024

View reviewed changes

dchigarev marked this pull request as ready for review November 15, 2024 15:15

dchigarev requested review from AndreyPavlenko and kurapov-peter November 15, 2024 15:15

AndreyPavlenko approved these changes Nov 15, 2024

View reviewed changes

kurapov-peter approved these changes Nov 16, 2024

View reviewed changes

dchigarev merged commit 8b64109 into intel:main Nov 18, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `linalg.matmul_transpose_b` for big tiles #410

Fix `linalg.matmul_transpose_b` for big tiles #410

Uh oh!

dchigarev commented Nov 15, 2024 •

edited

Loading

Uh oh!

dchigarev Nov 15, 2024

Uh oh!

dchigarev Nov 15, 2024

Uh oh!

dchigarev Nov 15, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix linalg.matmul_transpose_b for big tiles #410

Fix linalg.matmul_transpose_b for big tiles #410

Uh oh!

Conversation

dchigarev commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dchigarev Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

dchigarev Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

dchigarev Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `linalg.matmul_transpose_b` for big tiles #410

Fix `linalg.matmul_transpose_b` for big tiles #410

dchigarev commented Nov 15, 2024 •

edited

Loading